Morphoanalysis of Spanish Texts: Two Applications for Web Pages

نویسندگان

  • Octavio Santana Suárez
  • Zenón José Hernández Figueroa
  • Gustavo Rodríguez Rodríguez
چکیده

The applications described here follow up the works performed in the recent last years by the Data Structures and Computational Linguistics Group at Las Palmas de Gran Canaria University. These works have been developed about computational Linguistics and, as one of their results, some tools for morphologic identification and generation have been released. This work presents the use of those tools as parts of new applications designed to benefit from the great linguistic information flow from Internet. Two kinds of applications are identified, both according to the interactive grade of the linguistics studies to be done, and two prototypes, named DAWeb and NAWeb, are developed with special attention to their architecture in order to maximize the efficiency of both. Analysis modes include: neologism detection, word use (qualitative and quantitative measurements) and some syntax aspects like lexical collocations or prepositional regimes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

بررسی ارتباط بین کیفیت اطلاعات و شاخص های ظاهری در صفحات وب فارسی مرتبط با حوزه سلامت عمومی

  Introduction: One approach to evaluate the quality of a web page is to investigate its external markers. The purpose of the present study is to determine the relationship between information quality of Persian public health web pages and their external quality.   Methods: The samples of this correlation study were selected from among the freely available ten-key word texts of chronic diseases...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Babylon Parallel Text Builder: Gathering Parallel Texts for Low-Density Languages

This paper describes BABYLON, a system that attempts to overcome the shortage of parallel texts in low-density languages by supplementing existing parallel texts with texts gathered automatically from the Web. In addition to the identification of entire Web pages, we also propose a new feature specifically designed to find parallel text chunks within a single document. Experiments carried out o...

متن کامل

بهینه‌سازی اجرا و پاسخ صفحات وب در فضای ابری با روش‌های پیش‌پردازش، مطالعه موردی سامانه‌های وارنیش و انجینکس

The response speed of Web pages is one of the necessities of information technology. In recent years, renowned companies such as Google and computer scientists focused on speeding up the web. Achievements such as Google Pagespeed, Nginx and varnish are the result of these researches. In Customer to Customer(C2C) business systems, such as chat systems, and in Business to Customer(B2C) systems, s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003